128 research outputs found

    MINER: software for phylogenetic motif identification

    Get PDF
    MINER is web-based software for phylogenetic motif (PM) identification. PMs are sequence regions (fragments) that conserve the overall familial phylogeny. PMs have been shown to correspond to a wide variety of catalytic regions, substrate-binding sites and protein interfaces, making them ideal functional site predictions. The MINER output provides an intuitive interface for interactive PM sequence analysis and structural visualization. The web implementation of MINER is freely available at . Source code is available to the academic community on request

    Predicting functional sites with an automated algorithm suitable for heterogeneous datasets

    Get PDF
    BACKGROUND: In a previous report (La et al., Proteins, 2005), we have demonstrated that the identification of phylogenetic motifs, protein sequence fragments conserving the overall familial phylogeny, represent a promising approach for sequence/function annotation. Across a structurally and functionally heterogeneous dataset, phylogenetic motifs have been demonstrated to correspond to a wide variety of functional site archetypes, including those defined by surface loops, active site clefts, and less exposed regions. However, in our original demonstration of the technique, phylogenetic motif identification is dependent upon a manually determined similarity threshold, prohibiting large-scale application of the technique. RESULTS: In this report, we present an algorithmic approach that determines thresholds without human subjectivity. The approach relies on significant raw data preprocessing to improve signal detection. Subsequently, Partition Around Medoids Clustering (PAMC) of the similarity scores assesses sequence fragments where functional annotation remains in question. The accuracy of the approach is confirmed through comparisons to our previous (manual) results and structural analyses. Triosephosphate isomerase and arginyl-tRNA synthetase are discussed as exemplar cases. A quantitative functional site prediction assessment algorithm indicates that the phylogenetic motif predictions, which require sequence information only, are nearly as good as those from evolutionary trace methods that do incorporate structure. CONCLUSION: The automated threshold detection algorithm has been incorporated into MINER, our web-based phylogenetic motif identification server. MINER is freely available on the web at . Pre-calculated functional site predictions of the COG database and an implementation of the threshold detection algorithm, in the R statistical language, can also be accessed at the website

    Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of RNA homologs within genomic stretches is difficult when pairwise sequence identity is low or unalignable flanking residues are present. In both cases structure-sequence or profile/family-sequence alignment programs become difficult to apply because of unreliable RNA structures or family alignments. As such, local sequence-sequence alignment programs are frequently used instead. We have recently demonstrated that maximal expected accuracy alignments using partition function match probabilities (implemented in Probalign) are significantly better than contemporary methods on heterogeneous length protein sequence datasets, thus suggesting an affinity for local alignment.</p> <p>Results</p> <p>We create a pairwise RNA-genome alignment benchmark from RFAM families with average pairwise sequence identity up to 60%. Each dataset contains a query RNA aligned to a target RNA (of the same family) embedded in a genomic sequence at least 5K nucleotides long. To simulate common conditions when exact ends of an ncRNA are unknown, each query RNA has 5' and 3' genomic flanks of size 50, 100, and 150 nucleotides. We subsequently compare the error of the Probalign program (adjusted for local alignment) to the commonly used local alignment programs HMMER, SSEARCH, and BLAST, and the popular ClustalW program with zero end-gap penalties. Parameters were optimized for each program on a small subset of the benchmark. Probalign has overall highest accuracies on the full benchmark. It leads by 10% accuracy over SSEARCH (the next best method) on 5 out of 22 families. On datasets restricted to maximum of 30% sequence identity, Probalign's overall median error is 71.2% vs. 83.4% for SSEARCH (P-value < 0.05). Furthermore, on these datasets Probalign leads SSEARCH by at least 10% on five families; SSEARCH leads Probalign by the same margin on two of the fourteen families. We also demonstrate that the Probalign mean posterior probability, compared to the normalized SSEARCH Z-score, is a better discriminator of alignment quality. All datasets and software are available online.</p> <p>Conclusion</p> <p>We demonstrate, for the first time, that partition function match probabilities used for expected accuracy alignment, as done in Probalign, provide statistically significant improvement over current approaches for identifying distantly related RNA sequences in larger genomic segments.</p

    Coupling between Catalytic Loop Motions and Enzyme Global Dynamics

    Get PDF
    Catalytic loop motions facilitate substrate recognition and binding in many enzymes. While these motions appear to be highly flexible, their functional significance suggests that structure-encoded preferences may play a role in selecting particular mechanisms of motions. We performed an extensive study on a set of enzymes to assess whether the collective/global dynamics, as predicted by elastic network models (ENMs), facilitates or even defines the local motions undergone by functional loops. Our dataset includes a total of 117 crystal structures for ten enzymes of different sizes and oligomerization states. Each enzyme contains a specific functional/catalytic loop (10-21 residues long) that closes over the active site during catalysis. Principal component analysis (PCA) of the available crystal structures (including apo and ligand-bound forms) for each enzyme revealed the dominant conformational changes taking place in these loops upon substrate binding. These experimentally observed loop reconfigurations are shown to be predominantly driven by energetically favored modes of motion intrinsically accessible to the enzyme in the absence of its substrate. The analysis suggests that robust global modes cooperatively defined by the overall enzyme architecture also entail local components that assist in suitable opening/closure of the catalytic loop over the active site. © 2012 Kurkcuoglu et al

    Redistribution of Flexibility in Stabilizing Antibody Fragment Mutants Follows Le Chatelier's Principle

    Get PDF
    Le Châtelier's principle is the cornerstone of our understanding of chemical equilibria. When a system at equilibrium undergoes a change in concentration or thermodynamic state (i.e., temperature, pressure, etc.), La Châtelier's principle states that an equilibrium shift will occur to offset the perturbation and a new equilibrium is established. We demonstrate that the effects of stabilizing mutations on the rigidity ⇔ flexibility equilibrium within the native state ensemble manifest themselves through enthalpy-entropy compensation as the protein structure adjusts to restore the global balance between the two. Specifically, we characterize the effects of mutation to single chain fragments of the anti-lymphotoxin-β receptor antibody using a computational Distance Constraint Model. Statistically significant changes in the distribution of both rigidity and flexibility within the molecular structure is typically observed, where the local perturbations often lead to distal shifts in flexibility and rigidity profiles. Nevertheless, the net gain or loss in flexibility of individual mutants can be skewed. Despite all mutants being exclusively stabilizing in this dataset, increased flexibility is slightly more common than increased rigidity. Mechanistically the redistribution of flexibility is largely controlled by changes in the H-bond network. For example, a stabilizing mutation can induce an increase in rigidity locally due to the formation of new H-bonds, and simultaneously break H-bonds elsewhere leading to increased flexibility distant from the mutation site via Le Châtelier. Increased flexibility within the VH β4/β5 loop is a noteworthy illustration of this long-range effect

    At the crossroads of biomacromolecular research: highlighting the interdisciplinary nature of the field

    Get PDF
    Due to their complexity and wide-ranging utility, biomacromolecular research is an especially interdisciplinary branch of chemistry. It is my goal that the Biomacromolecules subject area of Chemistry Central Journal will parallel this richness and diversity. In this inaugural commentary, I attempt to set the stage for achieving this by highlighting several areas where biomacromolecular research overlaps more traditional chemistry sub-disciplines. Specifically, it is discussed how Materials Science and Biotechnology, Analytical Chemistry, Cell Biology and Chemical Theory are each integral to modern biomacromolecular research. Investigators with reports in any of these areas, or any other dealing with biomacromolecules, are encouraged to submit their research papers to Chemistry Central Journal

    How accurate and statistically robust are catalytic site predictions based on closeness centrality?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We examine the accuracy of enzyme catalytic residue predictions from a network representation of protein structure. In this model, amino acid α-carbons specify vertices within a graph and edges connect vertices that are proximal in structure. Closeness centrality, which has shown promise in previous investigations, is used to identify important positions within the network. Closeness centrality, a global measure of network centrality, is calculated as the reciprocal of the average distance between vertex <it>i </it>and all other vertices.</p> <p>Results</p> <p>We benchmark the approach against 283 structurally unique proteins within the Catalytic Site Atlas. Our results, which are inline with previous investigations of smaller datasets, indicate closeness centrality predictions are statistically significant. However, unlike previous approaches, we specifically focus on residues with the very best scores. Over the top five closeness centrality scores, we observe an average true to false positive rate ratio of 6.8 to 1. As demonstrated previously, adding a solvent accessibility filter significantly improves predictive power; the average ratio is increased to 15.3 to 1. We also demonstrate (for the first time) that filtering the predictions by residue identity improves the results even more than accessibility filtering. Here, we simply eliminate residues with physiochemical properties unlikely to be compatible with catalytic requirements from consideration. Residue identity filtering improves the average true to false positive rate ratio to 26.3 to 1. Combining the two filters together has little affect on the results. Calculated p-values for the three prediction schemes range from 2.7E-9 to less than 8.8E-134. Finally, the sensitivity of the predictions to structure choice and slight perturbations is examined.</p> <p>Conclusion</p> <p>Our results resolutely confirm that closeness centrality is a viable prediction scheme whose predictions are statistically significant. Simple filtering schemes substantially improve the method's predicted power. Moreover, no clear effect on performance is observed when comparing ligated and unligated structures. Similarly, the CC prediction results are robust to slight structural perturbations from molecular dynamics simulation.</p

    Hydrogen bond networks determine emergent mechanical and thermodynamic properties across a protein family

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gram-negative bacteria use periplasmic-binding proteins (bPBP) to transport nutrients through the periplasm. Despite immense diversity within the recognized substrates, all members of the family share a common fold that includes two domains that are separated by a conserved hinge. The hinge allows the protein to cycle between open (apo) and closed (ligated) conformations. Conformational changes within the proteins depend on a complex interplay of mechanical and thermodynamic response, which is manifested as an increase in thermal stability and decrease of flexibility upon ligand binding.</p> <p>Results</p> <p>We use a distance constraint model (DCM) to quantify the give and take between thermodynamic stability and mechanical flexibility across the bPBP family. Quantitative stability/flexibility relationships (QSFR) are readily evaluated because the DCM links mechanical and thermodynamic properties. We have previously demonstrated that QSFR is moderately conserved across a mesophilic/thermophilic RNase H pair, whereas the observed variance indicated that different enthalpy-entropy mechanisms allow similar mechanical response at their respective melting temperatures. Our predictions of heat capacity and free energy show marked diversity across the bPBP family. While backbone flexibility metrics are mostly conserved, cooperativity correlation (long-range couplings) also demonstrate considerable amount of variation. Upon ligand removal, heat capacity, melting point, and mechanical rigidity are, as expected, lowered. Nevertheless, significant differences are found in molecular cooperativity correlations that can be explained by the detailed nature of the hydrogen bond network.</p> <p>Conclusion</p> <p>Non-trivial mechanical and thermodynamic variation across the family is explained by differences within the underlying H-bond networks. The mechanism is simple; variation within the H-bond networks result in altered mechanical linkage properties that directly affect intrinsic flexibility. Moreover, varying numbers of H-bonds and their strengths control the likelihood for energetic fluctuations as H-bonds break and reform, thus directly affecting thermodynamic properties. Consequently, these results demonstrate how unexpected large differences, especially within cooperativity correlation, emerge from subtle differences within the underlying H-bond network. This inference is consistent with well-known results that show allosteric response within a family generally varies significantly. Identifying the hydrogen bond network as a critical determining factor for these large variances may lead to new methods that can predict such effects.</p

    Calculating Ensemble Averaged Descriptions of Protein Rigidity without Sampling

    Get PDF
    Previous works have demonstrated that protein rigidity is related to thermodynamic stability, especially under conditions that favor formation of native structure. Mechanical network rigidity properties of a single conformation are efficiently calculated using the integer body-bar Pebble Game (PG) algorithm. However, thermodynamic properties require averaging over many samples from the ensemble of accessible conformations to accurately account for fluctuations in network topology. We have developed a mean field Virtual Pebble Game (VPG) that represents the ensemble of networks by a single effective network. That is, all possible number of distance constraints (or bars) that can form between a pair of rigid bodies is replaced by the average number. The resulting effective network is viewed as having weighted edges, where the weight of an edge quantifies its capacity to absorb degrees of freedom. The VPG is interpreted as a flow problem on this effective network, which eliminates the need to sample. Across a nonredundant dataset of 272 protein structures, we apply the VPG to proteins for the first time. Our results show numerically and visually that the rigidity characterizations of the VPG accurately reflect the ensemble averaged properties. This result positions the VPG as an efficient alternative to understand the mechanical role that chemical interactions play in maintaining protein stability
    • …
    corecore